car-tr-kot 

CS  TR-3*24 


X0003  -4-95- 1-0521 
August  1997 


Knowledge-Based  Control  of  Vision  Systems 

f Chandra  Shekhar  J  Sabine  Moisan  i Regis  Vincent 
f Philippe  Burlina  fR.ama  Chellappa 

f  (’enter  for  Automation  Re.-earch 
I'niversiiy  of  Maryland 
College  Park.  Ml)  20742 

J  INRIA  Sophia- AntipaiF 
2004  Route  des  Luciole.- 
Sophia-Antipolis  Cedex.  France 


CAR-TR-867 

CS-TR-3824 


N00014-95-1-0521 
August  1997 


Knowledge-Based  Control  of  Vision  Systems 

■[Chandra  Shekhar  t Sabine  Moisan  i Regis  Vincent 
fPhihppe  Burhna  fRama  Chellappa 

f  Center  for  Automation  Research 
University  of  Maryland 
College  Park,  MD  20742 

|  INRIA  Sophia- Antipolis 
2004  Route  des  Lucioles 
Sophia- Antipolis  Cedex,  France 


Abstract 

We  propose  a  framework  for  the  development  of  vision  systems  that  incorporate,  along  with 
the  executable  programs,  the  syntactic,  semantic  and  strategic  knowledge  required  to  obtain 
optimal  performance.  In  this  approach,  the  user  provides  the  input  data,  specifies  the  vision 
task  to  be  performed,  and  then  provides  feedback  in  the  form  of  qualitative  evaluations  of 
the  result(s)  obtained.  These  assessments  are  interpreted  in  a  knowledge-based  framework 
to  automatically  select  algorithms  and  set  parameters  until  results  of  the  desired  quality  are 
obtained.  In  this  manner  the  vision  system  is  given  the  capacity  to  tune  itself  for  optimal 
performance.  A  system  thus  trained  on  a  small  subset  of  the  input  data  can  then  be  run 
autonomously  on  the  remaining  data  in  a  batch  mode.  This  approach  is  illustrated  on  two  real 
applications,  analysis  of  Synthetic  Aperture  Radar  (SAR)  imagery,  and  detection  of  vehicles  in 
aerial  photographs. 


The  support  of  this  research  by  the  Office  of  Naval  Research  under  Grant  N00014-95-1-0521  is  gratefully 
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1  Introduction 


Vision  systems  'used  in  challenging  operational  environments  should  satisfy  the  conflicting  re¬ 
quirements  of  flexibility  and  convenience.  Flexibility  is  the  ability  to  accommodate  variations 
in  operating  conditions.  Convenience  pertains  to  the  ease  of  operation  of  the  system  by  a  user 
who  is  not  familiar  with  the  technical  details  of  the  algorithms  employed. 

Variations  in  image  characteristics  are  caused  by  a  number  of  factors  such  as  weather, 
lighting  conditions  and  image  acquisition  parameters.  A  vision  system  should  accommodate  a 
reasonable  amount  of  such  variation,  and  should  degrade  gracefully  as  the  image  characteristics 
deviate  from  the  ideal.  Once  can  allow  for  such  variations  by  providing  alternative  algorithms 
for  each  task,  as  well  as  tuning  parameters  for  each  algorithm.  In  most  cases,  a  judicious 
choice  of  algorithms  and  parameters  provides  results  of  acceptable  quality  under  a  wide  range 
of  operating  conditions. 

Vision  systems  in  the  real  world  are  often  utilized  by  users  who,  while  competent  in  the 
visual  analysis  of  images,  may  not  be  familiar  with  the  technical  details  of  the  algorithms  they 
employ.  It  is  not  reasonable  to  expect  the  user  functioning  in  an  operational  situation  to  select 
and  tune  the  algorithms  for  the  task  (s)he  is  required  to  perform.  This  function  is  best  left 
to  the  designer  (the  vision  specialist)  who  may  not  be  available  during  the  system’s  operation. 
It  is  thus  obvious  that  a  vision  system  that  provides  flexibility  in  the  choice  of  algorithms  and 
parameter  values  may  not  be  very  convenient  for  the  user  to  utilize. 

In  order  to  achieve  the  conflicting  goals  of  flexibility  and  convenience,  we  propose  a  knowledge- 
based  framework  to  partially  or  fully  automate  the  reasoning  employed  by  the  vision  specialist 
in  obtaining  satisfactory  results  from  the  system.  The  proposed  framework  is  implemented 
using  the  LAMA  platform  [13,  20].  The  original  vision  algorithms  are  semantically  integrated 
into  this  framework.  The  integrated  system,  shown  schematically  in  Fig.  1,  can  then  be  made 
available  to  the  user.  This  type  of  system  is  capable  of  self-tuning,  i.e.  adapting  to  changes 
in  data  characteristics  and  performance  requirements  with  minimal  external  intervention.  Any 
interaction  with  the  user  is  in  terms  of  qualitative  evaluations  of  results,  and  not  in  terms  of 
algorithms  and  parameters.  In  many  situations,  the  same  processing  task  is  performed  on  a 
large  data  set  consisting  of  hundreds  or  even  thousands  of  images.  In  such  cases,  the  system 
can  be  interactively  tuned  on  some  representative  images,  and  once  satisfactory  performance  is 
achieved,  can  then  be  used  with  fixed  settings  for  batch  processing  of  the  remaining  images  in 
the  data  set. 

The  organization  of  this  paper  is  as  follows.  Section  2  briefly  reviews  related  work.  Sec¬ 
tion  3  discusses  the  basic  concepts  of  knowledge-based  control.  Section  4  presents  the  LAMA 
platform.  In  Sections  5  and  6,  results  of  applying  our  methodology  to  two  candidate  applica¬ 
tions  (Synthetic  Aperture  Radar  (SAR)  image  analysis  and  vehicle  detection  in  aerial  imagery) 
are  presented.  These  applications,  developed  at  the  University  of  Maryland,  were  selected 
firstly  because  they  address  interesting  and  non-trivial  problems,  and  secondly  because  their 
developers  were  available  to  provide  the  required  expertise,  indispensable  for  constructing  the 
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Figure  1:  Architecture  of  a  self-tuning  vision  system.  The  vision  system  has  a  number  of  stages 
of  processing,  with  different  possible  algorithm  choices  at  each  step.  Each  algorithm  may  have 
one  or  more  tunable  parameters.  The  user  evaluates  the  result(s)  of  processing,  and  the  control 
module  uses  this  feedback  to  change  algorithms  or  parameters  in  order  to  improve  the  results. 

knowledge  bases.  The  final  section  contains  the  conclusions  resulting  from  our  work,  as  well  as 
a  list  of  areas  for  further  investigation. 

2  Previous  work 

As  vision  algorithms  and  systems  have  grown  in  power  and  complexity  during  the  past  few 
decades,  there  has  been  a  corresponding  growth  in  software  platforms  tailored  to  vision  system 
development.  Traditional  methods  range  from  graphical  script  generators,  to  vision  toolkits, 
to  object-oriented  protocols  for  data  and  program  interchange.  The  emphasis  is  on  abstracting 
the  types  of  objects  and  computational  geometries  used  in  image  understanding  into  useful 
programming  constructs  [12].  One  of  the  objectives  is  to  enable  the  rapid  design  of  algorithms 
using  a  tool-box  of  pre-existing  constructs,  and  prototyping  of  longer  processing  chains  by 
linking  simpler  elements  together.  The  user  is  often  provided  with  a  visual  programming  en¬ 
vironment  (VPE),  which  enables  him  or  her  to  mix-and-match  between  the  available  methods. 
For  the  most  part,  these  systems  offer  only  syntactic  integration  of  vision  programs.  They  pro¬ 
vide  a  means  to  integrate  code  and  usage  syntaxes,  but  do  not  provide  a  means  to  incorporate 
further  knowledge  about  the  programs. 

Knowledge-based  techniques  can  be  used  in  various  ways  in  the  development  of  vision  sys- 
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terns.  For  an  excellent  survey  of  this  topic,  see  [6].  Knowledge-based  systems,  also  known  as 
expert  systems,  have  been  traditionally  used  for  the  high-level  interpretation  of  images,  and 
for  specific  vision  tasks  such  as  segmentation  (e.g.  [14]).  They  incorporate  mechanisms  for 
the  spatial  and  temporal  reasoning  that  is  characteristic  of  intermediate-  and  high-level  image 
understanding.  These  knowledge-based  systems  are  tailored  towards  specific  tasks,  and  are 
usually  not  generalizable  to  other  tasks. 

The  work  presented  in  this  paper  is  different  from  the  above  two  approaches  (syntactic 
integration  systems  and  expert  systems  for  specific  vision  tasks)  in  the  sense  that  it  proposes 
a  general-purpose  framework  for  knowledge-based  control  of  vision  systems.  The  approach  can 
be  used  to  develop  knowledge-based  semi- autonomous  systems  for  any  vision  application.  This 
is  a  continuation  of  the  work  reported  in  [5, 17]. 

Some  of  the  early  work  with  a  similar  motivation  is  reported  in  [8,  18,  11].  More  recently, 
this  problem  has  been  addressed  in  the  context  of  the  VIDIMUS  project  [1],  with  the  aim  of  de¬ 
veloping  an  intelligent  vision  environment  for  industrial  inspection.  A  knowledge-based  system 
(VDSE)  was  built  within  this  environment,  which  can  automatically  configure  a  vision  system 
for  a  given  inspection  problem.  The  automatic  generation  of  an  image  processing  script  based 
on  a  user  request  and  a  knowledge-based  model  of  an  application  domain  is  addressed  in  [4]. 
In  [7]  vision  algorithm  control  is  modeled  as  a  Markov  decision  problem.  This  model  is  used  to 
automatically  assemble  object  recognition  programs  from  existing  vision  algorithms.  In  [15, 16] 
a  context-based  vision  paradigm  is  proposed,  where  the  basic  aim  is  to  use  contextual  informa¬ 
tion  to  select  methods  and  parameters  in  a  vision  application.  The  authors  emphasize  the  need 
for  explicitly  encoding  semantic  knowledge  about  vision  algorithms  such  as  assumptions  about 
their  use  and  their  inherent  limitations.  The  use  of  contextual  information  derived  from  site 
models  to  construct  control  patches  for  the  self-tuning  of  vision  algorithms  is  discussed  in  [2]. 

3  Knowledge- based  control 

In  a  typical  vision  application,  a  number  of  stages  of  processing  are  involved  in  going  from  the 
raw  input  data  to  the  final  result,  as  shown  in  Fig.  1.  Typically,  at  each  stage  of  processing 
a  number  of  alternative  algorithms  can  be  employed.  Each  of  these  algorithms,  in  turn,  may 
have  one  or  more  tunable  parameters.  These  parameters  may  be  continuously  variable,  or  may 
take  discrete  sets  of  values.  Often,  due  to  uncertainty  in  the  data  and  in  the  problem  model, 
it  is  not  possible  to  predict  beforehand  if  a  given  algorithm  sequence  will  produce  the  desired 
result  for  a  certain  parameter  setting.  It  may  be  necessary  to  start  with  a  rough  guess  as  to 
the  parameters,  execute  the  algorithm  sequence,  examine  the  results,  and  if  necessary,  modify 
the  parameter  values  or  the  selection  of  algorithms,  and  repeat  the  procedure  until  results  of 
the  desired  quality  are  obtained. 

In  this  section,  we  examine  the  types  of  knowledge  used  by  a  vision  specialist  (and  therefore 
required  for  knowledge-based  control),  and  the  implications  for  the  design  of  self-tuning  vision 
systems. 
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3.1  Smart  modules 


From  a  problem-solving  perspective,  the  task  of  solving  a  vision  problem  using  a  given  set 
of  algorithms  involves  three  types  of  knowledge:  syntax,  semantics  and  strategy.  Syntactic 
knowledge  is  information  about  input  and  output  data  types  and  formats,  input  parameters, 
command-line  arguments,  etc.  Information  such  as  memory  required  may  also  be  considered 
as  syntactic  knowledge.  Semantic  knowledge  is  the  vision  specialist’s  expertise  about  the  char¬ 
acteristics  of  the  algorithms,  result  evaluation,  assembly  of  algorithms  for  a  given  task,  etc. 
Strategic  knowledge  pertains  to  the  high-level  decisions  that  should  be  made  about  result  eval¬ 
uation  (by  the  system,  by  the  user,  or  not  at  all)  and  failure  handling  (repair  failure  at  the 
current  module  or  at  another  place  in  the  chain  of  processing,  repair  failure  by  parameter  tuning 
or  by  algorithm  reselection,  etc.). 

The  objective  of  knowledge-based  control  is  to  provide  a  suitable  framework  for  “packaging” 
algorithms  using  the  three  types  of  knowledge  described  above.  Conceptually,  as  shown  in 
Fig.  2,  raw  programs  are  wrapped  with  layers  of  syntactic,  semantic  and  strategic  knowledge 
to  form  “smart”  modules.  This  is  done  in  a  recursive  fashion  for  more  complex  tasks — smart 
modules  can  be  connected  sequentially  (Fig.  3),  in  parallel  (Fig.  4),  or  selectively  (Fig.  5),  and 
the  ensemble  is  then  re- wrapped. 

strategic  wrapper 

_ semantic  wrapper 

_ syntactic  wrapper 

"raw"  program  or  "smart"  module  hierarchy 

i/o  specifications  (data,  parameters,  formats) 
command-line  syntax 


characteristics  sub-module  selection 

parameter  tuning  result  evaluation 

failure  handling  strategy  (local  or  nonlocal) 
operating  mode  (specialist  or  user) 
interactive  or  autonomous  functioning 


Figure  2:  Packaging  a  vision  algorithm  into  a  “smart”  module.  A  “raw”  executable  program 
is  not  useful  to  anyone  except  the  specialist  who  designed  and  implemented  it.  In  order  for 
another  person,  particularly  a  non-specialist,  to  use  the  program,  it  must  be  accompanied  by 
some  knowledge  about  how  to  run  it,  how  to  evaluate  its  results,  how  to  tune  it,  etc.  Smart 
modules  can  themselves  be  connected  in  various  ways  and  re-packaged  for  more  complex  tasks. 
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Figure  3:  A  module  with  a  sequential  decomposition  into  sub-tasks.  Execution  of  Module  1 
implies  sequential  execution  of  Modules  1.1  through  l.n. 


Module  1 


Module  1.1 


and 


Module  1.2 


and 


and 


Module  l.n 


Figure  4:  A  module  with  a  parallel  decomposition  into  sub-tasks.  Execution  of  Module  1  implies 
execution  in  parallel  of  Modules  1.1  through  l.n. 


Module  1 


Module  1.1 


or 


Module  1.2 


or 


or 


Module  l.n 


Figure  5:  A  module  with  different  possible  realizations.  Execution  of  Module  1  implies  the 
execution  of  one  of  Modules  1.1  through  l.n. 

3.2  Self-tuning 

Self-tuning  is  the  ability  of  the  vision  system  to  select  the  appropriate  algorithms  and  parameter 
values  based  on  the  input  data,  contextual  information,  and  feedback  from  the  user  in  the  form 
of  result  evaluations.  In  most  cases,  the  selection  of  algorithms  may  be  performed,  before  any 
data  processing  has  taken  place,  based  on  the  contextual  information  alone  [15].  Parameter 
tuning,  on  the  other  hand,  is  dependent  on  the  characteristics  of  the  specific  data  set.  It  has  to 
be  interleaved  with  the  processing  of  the  data,  based  on  the  operator’s  evaluation  of  the  results. 
Consider  a  vision  system  A  composed  of  m  stages,  with  rat-  parameters  in  the  ith  stage: 

A  =  A\  Ai . . .  Ai . . .  Am 

where  each  Ai  is  of  the  form 

A-i  Ai(pn ,  Pi2i  •  -  •  iPini) 

As  shown  in  Fig.  6  the  entire  vision  system  can  be  regarded  as  consisting  of  a  single  algorithm 
A(PuP2,-  •  -,Pn)-  The  N  parameters  that  tune  this  “black  box”  algorithm  are  the  YT=ini 
parameters  of  the  m  individual  stages.  We  refer  to  a  specific  setting  of  these  N  parameters 
as  an  operating  point  (OP).  We  define  an  acceptable  operating  point  as  a  parameter  setting 
that  yields  satisfactory  performance  for  the  given  data  set.  Our  basic  assumption  is  that  there 
exists  at  least  one  acceptable  operating  point  (or,  in  the  case  of  continuous-valued  parameters, 
operating  region).  In  general,  the  default  parameter  setting  will  not  be  an  acceptable  one.  The 
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Figure  6:  A  simplified  black-box  model  of  a  vision  system  consisting  of  stages  Ai-Am.  Tbe 
parameters  of  the  individual  modules  become  the  parameters  of  the  black  box. 


objective,  then,  is  to  be  able  to  find  an  acceptable  OP  for  the  given  data  set  in  an  efficient 
manner.  If  each  parameter  pi  has  k{  possible  values,  the  total  number  of  parameter  settings 
is  Yliki.  An  exhaustive  search  of  the  N- dimensional  parameter  space  may  therefore  be  too 
computationally  expensive.  Further,  it  is  bound  to  be  extremely  tiresome  for  the  user  who  has 
to  evaluate  the  results  at  each  step  and  provide  feedback.  This  is  where  a  knowledge-based 
approach  is  applicable.  A  vision  specialist  employs  a  problem-solving  strategy  consisting  of 
heuristics,  rules  of  thumb,  etc.,  which  effectively  help  him  or  her  find  a  short  cut  (“tuning 
path”)  to  an  acceptable  operating  point,  as  shown  in  Fig.  7,  without  having  to  explore  the 
entire  parameter  space.  The  strategy  employed  by  the  specialist  has  to  be  integrated  into  the 
vision  system  to  give  it  a  self-tuning  capability. 


Ps 


'  parameter  space  ’ 


acceptable  operating  region 


1  tuning  path 


default  setting 


_ / 


Figure  7:  Parameter  tuning  viewed  as  a  search  for  an  acceptable  operating  point  in  the  pa¬ 
rameter  space.  A  parameter  setting  chosen  from  the  acceptable  operating  region  results  in 
satisfactory  performance  of  the  vision  system.  One  of  the  basic  goals  of  vision  algorithm  con¬ 
trol  is  to  reach  an  operating  point  within  this  region. 
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3.3  Modes  of  operation 


Any  vision  task  A  can  be  hierarchically  decomposed  into  a  set  of  subtasks  A\— Am,  each  of 
which  (say  A, )  may  be  decomposed  further  into  subtasks  An~Aimi  and  so  on.  For  instance,  a 
typical  vision  system  may  consist  of  a  top-level  module  A,  consisting  of  sub-modules  A i  and 
A2,  and  these  may  be  composed  of  elementary  subtasks  An ,  A12,  A2 1,  and  A22.  In  our  model, 
attached  to  each  (sub)task  A...  at  any  level  in  the  hierarchy  is  a  strategy  T  which  contains 
all  the  specialist’s  knowledge  about  the  module:  how/when  it  should  be  used,  how  to  evaluate 
its  performance,  and  how  to  adjust  it  if  improved  performance  is  required.  Depending  on  the 
strategies  available,  this  type  of  a  system  can  function  in  one  of  two  modes:  the  specialist  mode 
or  the  user  mode. 

In  the  specialist  mode,  shown  in  Fig.  8,  all  the  strategies  at  every  level  of  the  hierarchy  are 
available.  In  other  words,  results  at  every  stage,  including  the  intermediate  ones,  are  available 
for  evaluation  by  the  specialist.  This  is  applicable  to  the  test  phase  when  the  specialist  is 
in  the  process  of  testing  the  functioning  of  the  system.  In  the  user  mode,  shown  in  Fig.  9, 
only  the  top-level  strategy  T  is  directly  available  and  only  the  results  of  the  final  stage  are 
available  for  evaluation  by  the  user.  This  mode  is  designed  for  the  operational  phase.  From  a 
control-theoretic  viewpoint,  a  self-tuning  vision  system  can  be  regarded  as  a  closed-loop  system 
where  the  observer  is  the  user,  who  provides  feedback  in  the  form  of  result  evaluations.  This 
feedback  can  be  either  at  the  highest  level  (corresponding  to  the  user  mode),  or  at  all  levels 
(corresponding  to  the  specialist  mode). 
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Figure  8:  Specialist  tuning  mode  for  a  three-level  vision  system.  The  results  of  the  modules  A, 
Ai,  A2  and  An-A22  and  the  corresponding  strategies  T,  Tu  T2  and  Tn-T22  are  available. 

In  the  specialist  mode,  the  availability  of  intermediate  results  enables  linear  planning  for 
the  fine-grain  local  optimization  of  algorithms.  The  user  mode,  however,  necessitates  the  use  of 
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Figure  9:  User  tuning  mode  for  a  three-level  vision  system.  Only  the  results  of  the  top-level 
module  A  and  the  corresponding  strategy  T  are  directly  available.  The  strategies  T\,  Ti  and 
T11-T22  are  available  only  indirectly  through  message  transmission. 

more  complex  reasoning,  since  only  the  final  result  is  available,  based  on  which  any  algorithm 
at  any  stage  of  the  processing  may  have  to  be  retuned. 

3.4  Result  evaluation 

In  the  user  mode,  the  vision  system  is  tuned  based  on  the  evaluation  of  results  by  the  user.  We 
can  define  two  types  of  result  evaluation:  general  and  specific.  General  evaluation  consists  of 
qualitative  global  judgments  about  algorithms  and  results  ( “too  many  false  alarms” ,  “too  many 
missed  detections  at  road  intersections”,  etc.).  Specific  evaluation,  on  the  other  hand,  pertains 
to  particular  objects  or  regions  in  the  result  (“this  airplane  is  a  false  alarm”,  “this  portion  of  the 
image  contains  too  many  false  alarms”,  etc.).  It  can  be  qualitative  or  quantitative  in  nature. 
An  example  of  each  type  of  evaluation  is  shown  in  Fig.  10. 

Both  types  of  evaluation  have  their  advantages  and  drawbacks.  Specific  evaluations  are 
simple,  intuitive,  and  precise.  The  user  is  not  expected  to  make  any  detailed  analysis  of  the 
results.  Further,  for  certain  applications  such  as  target  detection,  specific  evaluations  provide 
a  very  rich  form  of  feedback  to  the  control  engine.  However,  they  are  difficult  to  deal  with  in  a 
general-purpose  framework,  since  they  require  mechanisms  for  reasoning  and  for  user  interaction 
that  are  application-dependent.  This  makes  separation  of  the  application  from  the  knowledge 
base  more  difficult.  General  evaluations  are  of  a  more  complex  nature  than  specific  evaluations, 
since  they  require  more  detailed  feedback  from  the  user  on  the  types  of  errors  present  in  the 
output.  However,  they  are  more  suitable  for  a  general-purpose  framework,  since  they  do  not 
require  any  special  interaction  or  reasoning  mechanisms,  and  are  relevant  to  all  types  of  vision 
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Figure  10:  Example  of  the  two  types  of  result  evaluation.  The  application  here  is  the  detection 
of  vehicles  in  an  aerial  image.  The  user  can  provide  feedback  either  in  the  form  of  general 
remarks  about  the  overall  result,  or  in  more  specific  terms. 

problems.  In  this  paper,  we  deal  primarily  with  general  evaluations. 

4  The  LAMA  platform 

LAMA  is  a  methodology  as  well  as  a  general-purpose  platform  for  developing  intelligent  appli¬ 
cations,  consisting  of  a  kernel  library,  a  knowledge  base  description  language  called  YAKL  (Yet 
Another  Knowledge-base  Language),  verification  and  validation  (V  &  V)  facilities,  a  graphical 
user  interface  and  other  tools,  as  shown  in  Figure  11.  A  complete  description  of  LAMA  is 
beyond  the  scope  of  this  paper;  the  interested  reader  is  referred  to  [20].  For  our  purposes, 
LAMA  may  be  viewed  as  an  architecture  based  on  frames  and  rules  for  encapsulating  the 
problem-solving  knowledge  of  the  vision  specialist. 

A  vision  application  developed  using  LAMA  consists  of  a  set  of  pre-existing  algorithms 
(also  referred  to  as  programs,  modules  or  methods),  a  knowledge  base  (KB)  on  using  these 
algorithms,  and  a  control  (supervision)  engine.  A  vision  functionality  is  the  abstract  represen¬ 
tation  of  a  vision  task.  It  is  realized  in  a  concrete  form  by  one  or  more  operators  corresponding 
to  it.  An  operator  may  be  either  simple,  corresponding  to  an  executable  program,  or  composite, 
represented  by  a  predefined  skeletal  plan.  A  skeletal  plan  describes  a  network  of  connections 
between  operators  (choice,  sequence,  repetition,  etc.)  for  realizing  a  given  functionality.  The 
description  of  an  operator  contains  information  about  its  arguments  (name,  type,  range,  default 
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Figure  11:  The  LAMA  platform 

values,  etc.  of  the  input  data,  output  data  and  parameters),  semantic  information  about  its 
applicability  (in  the  form  of  pre-  and  post-conditions),  as  well  as  criteria  for  parameter  initial¬ 
ization,  result  evaluation,  etc.  For  operators  corresponding  to  real  executable  programs  the 
calling  syntax  is  also  provided.  The  structure  of  operators  is  shown  in  Fig.  12. 

The  functioning  of  the  control  engine  (Fig.  13)  can  be  decomposed  into  a  number  of  phases: 
planning  and  execution  of  programs,  evaluation  of  the  results,  and  repair ,  as  shown  in  Fig.  13. 
The  planning  step  first  builds  a  plan,  or  part  of  a  plan,  which  is  then  executed.  The  results  of 
execution  are  then  assessed  in  the  evaluation  step.  If  the  assessments  are  positive,  the  planning 
process  continues.  If  failures  are  detected,  the  repair  step  invokes  the  appropriate  remedial 
measures  which  may  either  result  in  re-execution  or  re-planning  of  any  part  of  the  hierarchy. 
The  planning,  evaluation  and  repair  steps  are  discussed  in  greater  detail  in  the  following  sections. 

4.1  Planning 

The  basic  planning  mechanism  used  is  hierarchical  script-based  planning.  A  plan  is  a  set  of 
steps  to  attain  the  desired  vision  objective.  It  is  similar  to  the  block  schematics  often  used  in 
the  design  of  signal  processing  algorithms,  but  is  different  in  two  ways: 

(a)  Plan  hierarchy:  each  “box”  in  the  plan  can  be  a  complex  vision  task,  with  its  own  plan, 

(b)  Plan  abstraction:  the  components  of  the  plan  represent  abstract  vision  tasks,  and  not  specific 
algorithms  or  programs.  During  execution,  choice  rules  are  used  for  selection  of  the  operator 
best  suited  for  the  task  at  hand.  They  are  of  the  form: 

if 


the  data  have  property  x 
AND  context  field  f  has  value  y 
AND  the  request  has  a  constraint  z 


then 


choose  an  operator  with  characteristic  b 


Figure  12:  Structure  of  operators  in  LAMA.  There  are  two  types  of  operators,  simple  and  com¬ 
plex.  Simple  operators  are  “packaged”  forms  of  executable  programs,  while  complex  operators 
have  an  associated  decomposition. 


Figure  13:  Control  strategy 


AND  do  not  choose  operator  w 

(The  above  example  shows  only  some  of  the  possible  types  of  premises  and  actions  in  a  choice 
rule.  Many  other  types  of  premises  and  actions  are  possible,  for  choice  rules  as  well  as  for  the 
three  other  kinds  of  rules  described  later.) 

Once  an  operator  is  selected,  it  is  initialized  using  initialization  rules,  of  the  form: 

if 


the  data  have  property  x 
AND  context  field  f  has  value  y 


then 


set  parameter  p  to  value  f(y) 

4.2  Evaluation 

The  performance  of  an  operator  is  analyzed  using  evaluation  rules,  which  are  of  the  form: 

if 


the  output  does  not  satisfy  criterion  c 


then 


judge  plan  entity  to  have  some  quality 
declare  failure 

The  “plan  entity”  could  be  the  current  operator,  a  result  of  processing,  or  in  general,  any 
plan  element  in  the  hierarchy.  Many  other  types  of  evaluation  are  possible,  using  information 
about  the  previous  evaluation,  about  the  number  of  times  the  operator  has  been  (unsuccessfully) 
executed,  about  whether  the  result  is  better  or  worse  than  before,  etc. 

4.3  Repair 

Failure  handling  in  LAMA  is  performed  either  locally  inside  the  operator,  or  nonlocally  by 
message  transmission  to  another  part  of  the  plan. 

Local  failures  are  handled  by  parameter  tuning  using  repair  and  adjustment  rules.  A  repair 
rule  for  local  failure-handling  is  usually  very  simple,  of  the  form: 

if 


true 


then 


re-execute 


When  this  rule  is  applied,  the  operator  is  re-executed,  invoking  one  or  more  adjustment 
rules  of  the  form: 

if 


result  r  has  quality  q 

then 


adjust  parameter  p  using  method  m 

When  the  failure  of  the  current  operator  is  caused  by  the  (undetected)  failure  of  another, 
the  repair  strategy  employs  message  transmission  from  the  current  operator  to  the  offending 
one.  The  repair  rule  in  this  case  is  of  the  form: 

if 

result  r  has  quality  q 

then 


send  message  m  to  operator  o 

The  control  then  switches  to  operator  o,  whose  failure-handling  strategy  is  then  activated, 
based  on  the  message  m  that  has  been  sent  to  it. 

4.4  Integrating  an  application  into  LAMA 

Vision  applications  typically  consist  of  a  number  of  executable  programs  with  associated  syn¬ 
tactic,  semantic  and  strategic  knowledge,  as  described  in  Section  3.1.  The  mechanisms  for 
planning,  evaluation  and  repair  provided  by  LAMA  enable  us  to  package  a  given  vision  system 
into  an  ensemble  of  smart  modules,  enabling  the  user  to  obtain  optimal  performance  with  min¬ 
imal  intervention.  A  knowledge  representation  language,  YAKL,  is  employed  for  this  purpose. 
Mechanisms  are  provided  to  test  the  consistency  of  the  knowledge  base  thus  created  [20].  A 
number  of  different  control  engines  are  available  for  running  the  integrated  application.  In  our 
work,  we  use  the  PEGASE  engine  [21]  which  provides  hierarchical  and  skeletal-based  planning 
interleaved  with  control  of  execution.  A  GUI  is  provided  for  examining  the  knowledge  base  as 
well  as  for  monitoring  the  application  during  execution. 

The  following  two  subsections  describe  how  to  build  a  LAMA/PEGASE  knowledge  base 
(KB)  for  an  application,  and  how  to  run  the  integrated  application. 

Building  a  knowledge  base 

The  first  step  is  the  construction  of  a  set  of  skeletal  plans  for  the  application.  A  plan  consists 
of  an  operator  hierarchy,  where  the  top-level  operator  solves  a  given  vision  task.  The  task 
performed  by  this  operator  is  then  recursively  decomposed  into  sub-tasks,  each  with  an  associ¬ 
ated  complex  or  simple  operator,  until  the  lowest  levels  of  the  operator  hierarchy  contain  only 


simple  operators.  Complex  operators  have  an  operator  decomposition  associated  with  them, 
whereas  simple  operators  specify  the  calling  syntax  of  the  associated  program.  A  goal  instance 
is  created  corresponding  to  the  top-level  operator  functionality. 

The  next  step  is  the  creation  of  the  rules  which  express  the  strategy  of  the  specialist  in 
executing  a  plan.  The  LAMA  architecture  enables  the  systematic  expression  of  the  “rules  of 
thumb”  and  “approximate  reasoning”  which  are  crucial  to  successful  problem-solving.  This  is 
done  by  means  of  the  five  types  of  rules  discussed  earlier.  A  mixture  of  numeric  and  symbolic 
reasoning  may  be  needed  for  all  four  types  of  rules.  Choice  rules,  being  the  simplest,  are 
defined  first.  They  use  the  values  of  context  fields,  data  definitions,  constraints  in  the  request, 
etc.  to  select  an  operator  from  among  the  choices  available.  Initialization  rules  are  defined 
next.  These  may  be  somewhat  more  difficult,  in  that  they  have  to  formalize  the  “rough  initial 
guesses”  that  the  specialist  makes  before  starting  an  vision  task.  Adjustment  rules  are  then 
defined  for  operators  which  have  adjustable  parameters.  Step  sizes  for  parameters  have  to  be 
carefully  chosen  so  that  the  change  in  the  behavior  of  the  algorithm  is  neither  too  sudden  nor 
too  gradual.  Then  evaluation  rules  are  defined.  This  is  a  rather  difficult  task,  since  the  question 
“Are  the  results  good  enough?”  is  often  subjective,  and  appropriate  quality  measures  may  not 
be  readily  available.  However,  a  close  examination  of  the  specialist’s  strategy  often  reveals 
hidden  reasoning  capable  of  being  expressed  in  concrete  terms  as  evaluation  rules.  Finally,  the 
repair  rules  are  defined.  These  rules  determine  the  overall  failure-handling  strategy,  and  are 
crucial  to  the  success  of  the  application. 

Running  an  application  in  LAMA/PEGASE 

The  solving  of  a  problem  starts  with  the  creation  of  a  request,  which  states  the  functionality, 
the  data  on  which  this  is  to  be  achieved,  and  the  context  in  which  the  problem  is  being  solved. 
Using  choice  rules,  the  available  operators  for  the  functionality  are  rank-ordered.  The  best 
operator  is  selected,  and  executed  on  the  given  data  after  the  operator’s  initialization  rules 
have  been  applied.  If  the  operator  is  a  simple  one,  this  corresponds  to  the  execution  of  the 
corresponding  program.  If  it  is  a  composite  one,  requests  for  its  sub-tasks  are  created,  and 
a  tree  of  requests  is  created.  Requests  are  chosen  from  this  tree  and  executed  depending  on 
their  sequencing.  If  the  mode  of  execution  control  calls  for  it,  the  results  of  executing  a  request 
are  judged  using  evaluation  rules,  and  in  the  event  of  a  failure,  either  the  same  operator  is 
re-executed  after  its  parameters  have  been  adjusted  using  the  relevant  rules,  or  the  next  best 
operator  is  applied.  In  the  event  of  successful  execution,  the  next  request  in  the  tree  is  selected 
for  execution.  This  continues,  and  the  execution  terminates  when  the  tree  of  requests  is  empty. 

The  next  two  sections  illustrate  our  approach  using  two  real  applications.  The  first  ap¬ 
plication,  Synthetic  Aperture  Radar  (SAR)  image  analysis,  illustrates  the  specialist  mode  of 
self-tuning,  while  the  second  application,  vehicle  detection,  illustrates  the  user  mode. 


16 


5  Example:  SAR  Image  Analysis 


This  application  is  based  on  the  work  of  Kuttikkad  and  Chellappa  [9].  Its  goal  is  to  analyze 
a  SAR  image  to  identify  semantic  objects  such  as  targets,  buildings,  roads,  and  trees,  and  to 
segment  the  remaining  parts  of  the  image  into  various  categories  such  as  grass,  water  and  bare 
ground.  The  first  step  is  the  detection  of  regions  of  high  backscatter  using  a  Constant  False 
Alarm  Rate  (CFAR)  technique  [9].  In  the  next  step  non-target  pixels  in  the  image  are  classified 
as  grass,  tree,  bare  ground,  road  or  shadow  using  a  Maximum  Likelihood  (ML)  approach. 
Training  data  obtained  from  other  images  of  similar  scenes  is  employed.  This  is  a  preliminary 
classification,  using  no  high-level  information  whatsoever.  A  large  percentage  of  the  pixels  are 
likely  to  be  misclassified. 

Shadow  regions  are  detected  and  then  eroded  and  grown  using  morphological  operations. 
The  same  is  done  for  pixels  classified  as  road.  Very  small  regions  of  either  class  are  eliminated, 
and  the  rest  are  grouped  into  homogeneous  regions.  Shadow  regions  that  are  adjacent  to  a  bright 
streak  (in  the  CFAR  output)  towards  the  sensor  are  classified  as  building  shadows.  Roads  are 
verified  using  a  shape/ size  criterion.  ML  segmentation  is  then  repeated  for  pixels  previously 
misclassified  as  road  and  shadow,  this  time  classifying  them  as  grass,  bare  ground  or  trees.  Tree 
regions  are  grown  using  morphological  operations,  and  verified  using  a  size  argument,  as  well 
as  by  the  presence  of  adjoining  shadow  regions  away  from  the  sensor.  In  the  CFAR  output, 
streaks  corresponding  to  buildings  are  eliminated,  and  the  remaining  target  pixels  are  grouped 
into  clusters.  In  the  final  step,  ML  segmentation  is  repeated,  and  based  on  the  previous  steps, 
pixels  misclassified  as  shadow,  road  or  tree  are  re-classified  into  grass  or  bare  ground. 

Examples  of  results  obtained  by  the  approach  are  shown  in  Fig.  14. 

Knowledge  base 

The  hierarchy  of  operators  for  this  application  is  shown  in  Fig.  15.  The  knowledge  base  consists 
of  the  following  major  components:  21  operators  (13  simple  and  7  complex),  18  sequential  and 
2  choice  links,  and  a  total  of  29  production  rules  (2  choice,  10  initialization,  6  evaluation,  6 
adjustment  and  5  repair).  The  complete  knowledge  base  will  not  be  described  in  detail  in  this 
paper.  Instead,  some  simple  examples  from  the  KB  are  presented  to  give  the  reader  a  feel  for 
the  kinds  of  objects  and  reasoning  involved  in  a  real  application. 

An  example  of  a  complex  operator  is  shown  in  Fig.  16.  The  functionality  is  road  verification, 
which  verifies  road  hypotheses  obtained  by  pixel  classification  and  region  growing.  This  operator 
has  a  decomposition  into  a  choice  between  two  simple  operators,  one  of  which  is  shown  in  Fig.  17. 
The  choice  rules  for  deciding  which  operator  to  select  are  shown  in  Fig.  18.  The  reasoning  is 
as  follows:  in  rural  areas,  roads  are  likely  to  have  longer  unbroken  stretches,  whereas  roads  in 
urban  areas  have  many  intersections.  Hence  an  operator  for  verifying  road  hypotheses  in  rural 
areas  should  use  length  as  a  criterion. 

The  rules  for  initializing  the  parameter  “PFA”  (probability  of  false  alarm)  for  the  operator 
“o-cfar”  are  shown  in  Fig.  19.  This  operator  is  used  to  detect  targets  in  a  SAR  image.  The 
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Figure  14:  Examples  of  results  of  SAR  image  analysis. 


SAR  image  understanding 


Figure  15:  Operator  hierarchy  for  SAR  image  analysis. 


Complex  Operator  { 

name  :  verify-roads 

comment  :  "verify  road  hypotheses  obtained  from  the  previous  steps" 
Functionality  :  road-verification 
Input  Data 

PGM-Image  name  :  road-region-image 

comment  :  "Binary  FGM  image  file  with  black  road  areas" 

Output  Data 

PGM-Image  name  :  road-image 

comment  :  "Output  PGM  image  file  for  storing  detected  shadows" 

Preconditions 

valid  road-region-image 

Postconditions 
valid  road-image 

Body 

verify-road-1  OR  verify-road-2  ; 

Choice  criteria 


Data  Flow 

ver if y-roads.road-region-image/verify-road-1. road-region- image 
verify-roads. road-region-image/ver if y-road-2. road-region- image 
verify-roads.road-image/verify-road-2. road-  image 
verify-roads .road- image/ verify-road-1. road- image 


Figure  16:  Example  of  a  complex  operator 
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Terminal  Operator  {  name  :  verify-road-1 
comment  :  "Extended  fill" 

Functionality  :  road-verification 

Input  Data 

PGM-Image  name  :  road-region- image 

comment  :  "Binary  PGM  image  file  with  black  road  areas" 

Input  Parameters 
Float  name  :  aspect 

comment  :  "Aspect  ratio  threshold  for  road  segments" 
default  :  2. 

Float  name  :  fill-ratio 

comment  :  "Fraction  fill  of  bounding  rectangle  for  valid  road" 
default  :  .6 

Integer  name  :  min-road-length 

comment  :  "min  road  length  in  pixels" 
default  :  25 

Output  Data 

PGM-Image  name  :  road- image 

comment  :  "Output  PGM  image  file  for  storing  detected  shadows" 

1-0  relations  : 

road- image. path  :=  road-region- image .path, 
road- image. basename  :=  road-region- image .basename, 
road- image. extension  :=  ".roads" 

Preconditions 

valid  road-region-image 

Postconditions 
valid  road- image 

Call 

language  :  shell 

syntax  :  rdtwo  road-region- image. get-filename  road- image .get-filename 
-a  aspect  -r  fill-ratio  -1  min-road-length 
program  name  :  rdtwo 

> 


Figure  17:  Example  of  a  simple  operator 


Rule 

name  :  ch-verify-road-1 

comment  :  nRoads  in  rural  areas  have  longer  unbroken  stretches, 
since  there  are  fewer  intersections" 


if 


then 


context . scene-type  ==  rural 
use-operator-of -characteristic  length-based 


Rule 

name 

if 

then 


ch-verify-road-2 
context .scene-type  —  urban 
use-operator-of -characteristic  support-based 


Figure  18:  Examples  of  choice  rules 


Rule 

name  :  init-PFA-1 

comment  :  "Initialize  PFA  heuristically" 
if 

context .noise  ==  5  low 

then 

PFA  := .001 

Rule 

name  :  init-PFA2 

comment  :  "Initialize  PFA  heuristically" 
if 

context .noise  ==  ’high 

then 

PFA  :=  .0001 


Figure  19:  Examples  of  initialization  rules 

parameter  P FA  is  a  threshold  which  determines  the  number  of  bright  pixels  that  are  classified 
as  target  pixels.  The  higher  the  PFA,  the  more  likely  it  is  that  a  given  pixel  is  classified  as  a 
target  pixel.  The  rules  for  evaluating  this  operator  are  shown  in  Fig.  20.  The  user  is  asked 
to  judge  if  the  results  of  target  detection  are  satisfactory.  If  not,  appropriate  action  is  taken 
via  adjustment  rules,  such  as  shown  in  Fig.  21.  An  example  of  the  entire  chain  of  reasoning  is 
shown  in  Fig.  22. 

6  Example:  Vehicle  detection 

The  methodology  discussed  in  Section  4  has  also  been  applied  to  the  detection  of  vehicles  in 
aerial  imagery.  We  have  used  a  simplified  version  of  the  Vehicle  Detector  developed  at  the  Uni¬ 
versity  of  Maryland  [3],  which  detects  and  approximately  localizes  vehicles  of  a  specified  size  and 
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Rule 

name  :  ev-target 

comment  :  "Ask  the  operator  to  judge  if 

the  number  of  target  pixels  looks  ok" 
if 

true 

then 

assess-data-by-user  cfar-image 

(num-target-pixels-correct  num-target-pixels-too-low 

num-target-pixels-too-high) 

Rule 

name  :  ev-target2 
if 

not  assess-data?  cfar-image  num-target-pixels-correct 


then 


assess-operator  failed  repair 


Figure  20:  Examples  of  evaluation  rules 


Rule 

name  :  adj-PFA-1 

comment  :  "If  #  target  pixels  is  too  low,  increase  PFA" 
if 

assess-operator?  detect  detect-PFA-too-low 

then 

adjustment -method  PFA  percent-float, 
adjustment -step  PFA  400, 
increase  PFA 

Rule 

name  :  adj-PFA-2 

comment  :  "If  #  target  pixels  is  too  high,  reduce  PFA" 
assess-operator?  detect  detect-PFA-too-high 


if 

then 


adjustment -method  PFA  percent-float, 
adjustment- step  PFA  80, 
increase  PFA 


Figure  21:  Examples  of  adjustment  rules 
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Figure  22:  Example  of  the  reasoning  used  by  the  control  engine.  The  initial  choice  of  the 
parameter  PFA  is  0.001.  The  operator  is  executed,  and  the  user  examines  the  result  and  judges 
the  number  of  target  pixels  to  be  too  low.  The  parameter  is  then  adjusted  to  0.005,  and  the 
operator  is  re-executed. 
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orientation.  Our  objective  here  is  to  demonstrate  the  application  of  the  approach  presented  in 
this  paper  to  a  simple  existing  vision  application.  We  do  not  attempt  to  improve  on  the  underly¬ 
ing  application  itself  in  any  way.  The  main  stages  of  processing,  shown  in  Fig.  23,  are  as  follows. 


preprocessing 


hypothesis 

generation 


hypothesis 

validation 

Figure  23:  A  simplified  version  of  the  IJMD  vehicle  detector.  The  input  is  an  aerial  image. 
Contours  are  extracted,  and  vehicle  hypotheses  are  generated.  These  hypotheses  are  then 
validated  to  obtain  the  final  result. 

Preprocessing:  Edge  pixels  are  extracted  using  the  Canny  edge  detector.  Both  gradient  magni¬ 
tude  and  gradient  direction  are  computed. 

Hypothesis  generation:  A  modified  generalized  Hough  transform  (GHT)  is  used  to  locate  areas 
corresponding  to  centers  of  candidate  vehicles.  Edge  pixels  vote  for  all  possible  centers  of  ve¬ 
hicle  contours  which  contain  the  pixel.  The  votes  are  collected  in  an  accumulator  array,  and 
thresholded.  The  result  is  a  set  of  hypothesized  vehicle  centers.  Local  “rubber-band”  con¬ 
tour  matching  is  subsequently  applied  to  reject  candidate  vehicles  which  do  not  have  sufficient 
support  boundaries  on  both  sides  of  the  vehicle. 

Hypothesis  verification:  This  stage  resolves  spatial  conflicts  (overlaps)  between  vehicle  hypothe¬ 
ses.  This  is  done  in  three  steps.  In  the  first  step,  the  conflict  resolution  is  done  purely  on  the 
basis  of  the  distances  between  the  centers  of  candidate  vehicles.  If  two  candidate  vehicles  are 
closer  than  a  certain  fraction  of  their  width,  the  one  with  the  greater  boundary  support  is 
retained.  The  second  step  uses  the  size  of  the  overlap  area  between  two  conflicting  vehicles  as 
a  criterion  for  rejecting  the  weaker  vehicle.  In  the  final  step,  the  longitudinal  distance  between 
adjacent  vehicles  lying  on  the  same  axis  is  used  as  a  filtering  criterion. 


6.1  Knowledge  base 


The  operator  hierarchy  for  the  vehicle  detector  is  shown  in  Fig.  24.  The  knowledge  base  is  under 
development.  Currently,  it  consists  of  13  operators  (4  complex  and  9  terminal),  10  sequential 
operator  links,  two  choice  links,  and  a  total  of  37  rules  (2  choice,  1  initialization,  11  assessment, 
8  adjustment  and  15  repair).  As  the  knowledge  base  for  this  application  is  developed  further, 
the  number  of  operators  is  expected  to  increase  only  slightly,  whereas  the  number  of  rules  is 
expected  to  increase  considerably. 
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Figure  24:  Operator  hierarchy  for  the  UMD  vehicle  detector. 


6.1.1  Evaluation  and  repair  strategies 

As  in  any  target  detection  application,  there  are  two  principal  types  of  errors:  missed  detections 
(MDs)  and  false  alarms  (FAs).  The  general  objective  is  to  reduce  both  types  of  errors  as  much 
as  possible.  In  practice,  some  tradeoff  is  made  between  the  MD  rate  and  the  FA  rate.  Currently, 
the  user  is  asked  to  choose  between  the  responses  MD  (too  many  missed  detections),  FA  (too 
many  false  alarms)  and  OK  (results  are  satisfactory).  If  errors  of  both  kinds  are  simultaneously 
present,  as  is  usually  the  case,  the  user  selects  the  error  which  is  more  significant.  If  the 
response  is  not  OK,  the  user  is  further  queried  about  the  type  of  MD  or  FA,  as  shown  in 
Fig.  25.  Currently,  MDs  due  to  the  following  three  situations  are  recognized:  vehicles  too  large 
or  too  small,  vehicles  have  low  contrast,  vehicles  too  tightly  packed.  Four  types  of  FAs  are 
handled:  multiple  hits  from  the  same  vehicle,  false  positives  at  control  lines  (the  lines  used  to 
demarcate  the  different  parking  spots),  puddles/oil  stains  mistaken  for  vehicles,  and  FAs  due 
to  pavement  texture.  Extensive  testing  on  a  diverse  set  of  aerial  images  will  enable  us  to  create 
a  richer  taxonomy  of  errors. 
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poor  insufficient  too  small/  pavement  puddles  and  control  multiple 

contrast  spacing  too  large  texture  oil  stains  lines  hits 

Figure  25:  User  evaluation  of  the  UMD  vehicle  detector.  The  user  looks  at  the  result,  and 
judges  if  there  are  missed  detections  (MDs)  or  false  alarms  (FAs),  and  if  so,  of  what  type. 

The  repair  mechanism  has  a  nested  structure,  and  is  interleaved  with  the  evaluation  mech¬ 
anism.  For  every  allowed  error  subtype  there  are  one  or  more  repair  strategies.  The  repair 
strategies  are  tried  one  after  the  other  until  either  the  error  disappears  or  the  strategies  are 
exhausted.  An  example  is  shown  in  Figure  26.  The  first  run  of  the  system  produces  the  result 
shown  in  the  first  box  in  the  second  row.  Evaluation  rules  are  fired,  asking  the  user  to  judge 
this  result.  The  user  indicates  that  there  seem  to  be  false  alarms,  possibly  due  to  control  lines 
on  the  parking  area.  This  triggers  repair  rules,  which  successively  transmit  this  judgment  to 
the  operator  “verify-hypotheses”  and  then  to  “check-texture” ,  which  is  responsible  for  filtering 
out  such  false  detections.  An  adjustment  rule  in  “check-texture”  is  then  applied,  adjusting 
a  parameter  “contrast-threshold”  from  5  to  15.  This  parameter  is  the  minimum  allowable 
difference  in  mean  grey  level  between  a  candidate  vehicle  and  its  background.  Raising  this 
threshold  eliminates  the  false  alarms  caused  by  the  control  lines,  producing  the  result  shown  in 
the  first  box  of  the  third  row.  The  user  evaluates  this  result  again,  judging  it  to  have  too  many 
missed  detections,  possibly  due  to  the  vehicles  being  too  close  together.  This  judgment  triggers 
a  sequence  of  repair  and  adjustment  rules,  resulting  in  the  decrement  of  a  parameter  which 
specifies  the  factor  by  which  vehicles  have  to  be  scaled  before  deciding  that  they  overlap.  This 
allows  candidate  vehicles  spaced  closer  together  to  be  judged  as  non-overlapping,  and  hence 
acceptable.  The  final  result,  which  the  user  judges  to  be  “OK”,  is  shown  in  the  first  box  on  the 
last  row. 
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Figure  1:  Example  of  repair  strategy. 


7  Conclusions  and  future  work 


This  paper  has  presented  a  methodology  for  adding  flexibility  and  convenience  to  an  existing 
vision  system  by  integrating  the  vision  specialist’s  knowledge  into  it  using  a  knowledge-based 
architecture.  The  proposed  methodology  assumes  that  the  vision  system  has  a  non-empty 
operating  region  at  which  it  yields  satisfactory  results.  It  imitates  the  strategy  of  the  specialist 
in  reaching  a  point  in  this  region  from  a  given  or  default  setting.  The  system  can  thus  self-tune 
in  response  to  the  user’s  evaluations.  This  type  of  system  will  be  of  immense  value  in  making 
the  full  power  of  vision  algorithms  available  to  end-users. 

Our  future  work  will  focus  on  the  following  areas: 

Detecting  failure:  Obviously,  if  no  combination  of  the  tuning  parameters  can  yield  sat¬ 

isfactory  results,  neither  the  specialist  nor  the  self-tuning  framework  will  have  any  chance  of 
succeeding.  On  the  other  hand,  if  the  self-tuning  strategy  does  not  capture  the  full  complexity 
of  the  specialist’s  reasoning,  it  may  fail  in  difficult  cases  even  if  a  solution  exists.  Detecting  this 
failure  may  not  always  be  easy  for  the  user,  since  the  self-tuning  strategy  may  have  loops  and 
other  complex  chains  of  reasoning.  If  the  system  does  not  solve  the  problem  in  a  reasonable 
amount  of  time,  it  should  be  considered  as  having  failed.  Experimentation  on  large  and  diverse 
data  sets  and  constant  refinement  of  the  knowledge  base  will  ensure  that  such  failures  do  not 
occur  too  often. 

Validity  of  the  model:  Closely  related  to  the  problem  of  detecting  failures  is  the  validation 
of  the  basic  self-tuning  model  itself.  This  model  assumes  that  based  on  certain  “symptoms” 
of  the  result,  a  diagnosis  can  be  made  as  to  which  parameter  (or  algorithm)  is  at  fault.  As 
in  [7],  the  vision  system  is  assumed  to  possess  a  “Markovian”  property,  which  allows  order- 
independent  and  mutually  independent  parameter  tuning.  The  problem  of  parameter  tuning 
is  essentially  ill-posed:  there  may  be  many  more  parameters  than  symptoms,  and  a  one-to-one 
mapping  may  not  be  easy  to  establish  in  all  cases.  A  more  complex  problem-solving  model  that 
takes  this  into  account  needs  to  be  investigated. 

Specific  evaluations:  Currently,  result  evaluations  are  in  the  form  of  general  remarks  about 
the  results  obtained,  and  not  about  specific  portions  or  objects  of  the  output.  Our  future  work 
will  incorporate  some  mechanisms  for  handling  specific  evaluations.  The  challenge  is  in  making 
these  mechanisms  as  application-independent  as  possible. 

Quantitative  performance  measures:  There  has  been  some  recent  work  on  statistical 

performance  characterization  of  vision  algorithms  (e.g.  [10]).  Integrating  this  kind  of  tool  into 
a  self-tuning  vision  architecture,  although  a  challenging  task,  is  a  promising  area  of  research, 
because  it  would  add  a  quantitative  dimension  to  the  result  evaluation  process. 
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Learning:  Machine  learning  principles  can  be  applied  to  exploit  the  wealth  of  information 

accumulated  over  a  period  of  time  as  the  vision  system  is  tested  on  different  data  sets  [19]. 
The  current  implementation  of  the  LAMA  platform  does  not  make  the  operational  history 
of  the  self-tuning  process  directly  available  to  the  system  developer.  The  availability  of  this 
information  will  enhance  machine  learning  capabilities,  and  will  also  be  useful  in  performance 
analysis. 

Other  applications:  The  vehicle  detection  application  will  be  upgraded  to  use  a  version 

capable  of  detecting  vehicles  of  all  sizes  and  orientations.  We  propose  to  test  the  methodology 
on  other  candidate  problems  such  as  multisensor  registration,  and  to  validate  it  using  large  and 
diverse  data  sets. 
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